Federal Statistical Service estimates the average real wage in Russia every month since January 1993 (source: http://sophist.hse.ru/hse/1/tables/WAG_M.htm). We need to forecast is for 3 years ahead.
Note that for the first 5 years the behaviour of the series is quite
different from what we have later. For the model not to overfit to
irrelevant data, let’s only keep the part starting from January 1999.
Here it is:
STL decomposition:
Box-Cox transformation with optimal \(\lambda\):
Transformation clearly stabilizes the variance, so it makes sense to use it.
## ETS(A,A,A)
##
## Call:
## ets(y = tSeries, lambda = LambdaOpt, biasadj = T)
##
## Box-Cox transformation: lambda= -0.3309
##
## Smoothing parameters:
## alpha = 0.3967
## beta = 0.039
## gamma = 2e-04
##
## Initial states:
## l = 2.2046
## b = 0.0029
## s = 0.0383 -0.0023 -0.0038 -0.0034 -0.0065 -8e-04
## 0.0056 -0.0024 -0.0021 -0.0012 -0.0107 -0.0108
##
## sigma: 0.0043
##
## AIC AICc BIC
## -1482.757 -1480.482 -1420.546
Residuals:
Ljung-Box test p-values for them:
The residuals are correlated, the model does not take into account all the structure in the data.
| Hypothesis | Test | Result | P-value |
|---|---|---|---|
| Normality | Shapiro-Wilk | rejected | 0.2795714 |
| Unbiasedness | Wilcoxon | not rejected | 0.1139321 |
| Stationarity | KPSS | not rejected | 0.1 |
Fitting the selected model to the first \(T-D\) points of the series to check the accuracy of the forecast on the last \(D\) points:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.3579105 4.558934 3.278331 -0.1466165 1.780820 0.2430450
## Test set -2.0947836 12.282823 10.030785 -0.9188242 3.604003 0.7436503
## ACF1 Theil's U
## Training set 0.4850088 NA
## Test set 0.7181562 0.3478321
Using auto.arima:
## Series: tSeries
## ARIMA(2,1,1)(2,1,1)[12]
## Box Cox transformation: lambda= -0.3308584
##
## Coefficients:
## ar1 ar2 ma1 sar1 sar2 sma1
## -1.2110 -0.2752 0.9543 0.3316 0.1772 -0.8167
## s.e. 0.0749 0.0589 0.0505 0.1540 0.1077 0.1360
##
## sigma^2 = 1.398e-05: log likelihood = 1145.48
## AIC=-2276.96 AICc=-2276.54 BIC=-2251.67
ARIMA(2,1,1)(2,1,1)\(_{12}\) is
selected. Here are the residuals:
At the beginning residuals are not defined properly; let’s cut the
first 12 before proceeding with residual analysis.
| Hypothesis | Test | Result | P-value |
|---|---|---|---|
| Normality | Shapiro-Wilk | rejected | 2.7189747^{-5} |
| Unbiasedness | Wilcoxon | rejected | 0.0025092 |
| Stationarity | KPSS | not rejected | 0.1 |
Fitting the selected model to the first \(T-D\) points of the series to check the accuracy of the forecast on the last \(D\) points:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.5668275 3.480322 2.533355 -0.3503315 1.416438 0.1878149
## Test set -14.8105932 21.387429 16.553117 -5.4774917 6.048955 1.2271952
## ACF1 Theil's U
## Training set 0.1398763 NA
## Test set 0.7794353 0.6056356
The series is nonstationary (p<0.01, KPSS test) and clearly
seasonal; let’s do seasonal differencing:
Still not stationary (p<0.01, KPSS test) with visible trend. Let’s do
another, nonseasonal differencing:
The hypothesis of stationarity is now not rejected (p>0.1, KPSS
test).
Look at ACF and PACF of the obtained series:
Autocorrelation is significantly different from 0 for lags 1, 8, 9, 10, 11, 12 and 36. Since 36 is maximal significant seasonal lag, we could use \(Q = 36/12 = 3\) as an initial approximation. Maximal significant lag before 12 is 11, hence the starting value \(q=11\).
Partial autocorrelation is significantly different from 0 for lags 1, 8, 9, 12, 36. Following the same logic as above, we select initial values \(P=3\), \(p=9\).
Next we’ll look for the best models with auto.arima using d=1, D=1, max.p=10, max.q=10, max.P=4, max.Q=4 (where possible, we added 1 to every initial approximation found above just in case), and the parameters of the automatic model as starting points of the search (start.p=2, start.q=1, start.P=2, start.Q=1).
## Series: tSeries
## ARIMA(3,1,0)(3,1,1)[12]
## Box Cox transformation: lambda= -0.3308584
##
## Coefficients:
## ar1 ar2 ar3 sar1 sar2 sar3 sma1
## -0.2504 -0.0321 0.0349 -0.1628 -0.0325 -0.2535 -0.3396
## s.e. 0.0444 0.0113 0.0477 0.0350 0.0133 0.0430 0.0577
##
## sigma^2 = 1.376e-05: log likelihood = 1147.66
## AIC=-2279.33 AICc=-2278.78 BIC=-2250.42
The lowest AICc has ARIMA(3,1,0)(3,1,1)\(_{12}\). Does it have good residuals?
As before, we need to cut the first 12:
Ljung-Box test p-values for the residuals:
Q-Q plot and histogram of the residuals:
| Hypothesis | Test | Result | P-value |
|---|---|---|---|
| Normality | Shapiro-Wilk | rejected | 2.330553^{-5} |
| Unbiasedness | Wilcoxon | rejected | 0.0153604 |
| Stationarity | KPSS | not rejected | 0.0749365 |
Fitting the selected model to the first \(T-D\) points of the series to check the accuracy of the forecast on the last \(D\) points:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.5005224 3.352793 2.423655 -0.3236066 1.365898 0.1796821
## Test set -9.4839133 17.458533 12.469632 -3.5848460 4.535765 0.9244586
## ACF1 Theil's U
## Training set 0.1422908 NA
## Test set 0.7442972 0.4946601
Comparing the residuals of two ARIMAs:
##
## Diebold-Mariano Test
##
## data: resres.auto
## DM = -0.98244, Forecast horizon = 1, Loss function power = 2, p-value =
## 0.3268
## alternative hypothesis: two.sided
Diebold-Mariano test does not find the differences between forecasting errors of two ARIMAs significant. The residuals of two models have the same properties. Manually selected model has smaller AICc and lower test set error, so we’ll use the manually tuned ARIMA.
Comparing the residuals of the best ARIMA and the best ETS models:
##
## Diebold-Mariano Test
##
## data: resres.ets
## DM = -2.8839, Forecast horizon = 1, Loss function power = 2, p-value =
## 0.004239
## alternative hypothesis: two.sided
##
## Diebold-Mariano Test
##
## data: resres.ets
## DM = -2.8839, Forecast horizon = 1, Loss function power = 2, p-value =
## 0.00212
## alternative hypothesis: less
Diebold-Mariano test says ARIMA is better. On one hand, ARIMA has non-autocorrelated residuals, and much smaller AIC. On the other, there’s a small bias in residuals, and it has larger test error. There is no clear winner, but note that the train-test separation in this case might be not ideal, because the last 3 years that we have in the test set are quite different from the rest - it seems that the nature of the trend had changed. Probably our ARIMA model was not able to capture that change from the limited amount of information about it in the end of the training set, hence the big test set error. Taking all of that into account, we’ll build final forecasts with manually tuned ARIMA.
The residuals of the model are not normal, so we use bootstrap for prediction intervals:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Dec 2022 367.5217 356.2814 377.5753 349.5482 387.1945
## Jan 2023 263.2104 254.1909 272.0410 248.9811 277.7222
## Feb 2023 266.8793 256.0327 277.9776 248.8147 285.6286
## Mar 2023 286.4371 272.8020 300.0030 264.7186 309.6294
## Apr 2023 281.0016 266.5705 295.9388 258.1217 306.9199
## May 2023 278.8679 263.3225 295.1137 255.3142 305.7011
## Jun 2023 293.6610 275.6955 312.7772 265.0720 324.7496
## Jul 2023 273.6003 256.1000 291.9301 246.3855 304.3635
## Aug 2023 263.4379 245.7240 282.3711 235.7081 293.9525
## Sep 2023 270.7100 251.2122 291.9821 240.7554 304.8247
## Oct 2023 271.7881 251.2988 294.2316 240.5371 307.5925
## Nov 2023 273.1195 251.0152 296.8947 240.0961 310.0536
## Dec 2023 371.3517 335.8546 409.7660 316.9849 432.9363
## Jan 2024 268.0214 243.0485 294.8046 230.2990 311.1537
## Feb 2024 270.8639 244.4113 300.1096 230.0317 317.8827
## Mar 2024 291.2389 261.1146 324.9289 245.6595 345.6981
## Apr 2024 278.7700 249.3830 311.7301 233.1345 333.2515
## May 2024 279.8434 248.4645 314.7684 233.0355 336.4995
## Jun 2024 294.4395 259.4412 333.7530 242.1489 359.5184
## Jul 2024 276.0902 242.8413 313.3972 225.7759 340.1769
## Aug 2024 266.2150 233.0401 303.0712 217.0923 330.2678
## Sep 2024 273.7523 238.1626 314.1265 220.6080 341.1819
## Oct 2024 275.6336 238.4577 317.6452 221.0963 345.9743
## Nov 2024 275.5516 237.4967 319.5391 219.3692 347.0723
## Dec 2024 376.9150 316.2825 446.3892 287.3625 496.2840
## Jan 2025 271.4612 230.0715 317.2045 210.7249 348.7217
## Feb 2025 275.2167 232.0851 324.6320 211.2711 361.5848
## Mar 2025 295.4952 246.2135 352.1115 223.0470 391.8300
## Apr 2025 291.5972 241.7037 348.4539 219.4398 390.2921
## May 2025 291.0907 239.5515 348.8630 216.6314 392.7869
## Jun 2025 304.3050 247.7081 368.3884 222.6124 420.1398
## Jul 2025 285.0286 231.4811 345.9107 208.6564 392.7956
## Aug 2025 273.2893 221.5090 332.1905 199.2889 380.4212
## Sep 2025 281.6016 226.0041 345.1570 201.7898 394.6419
## Oct 2025 282.2073 225.6691 348.2821 200.7293 399.1934
## Nov 2025 282.6631 224.6935 350.1206 200.5288 404.6833